机器学习中的许多基本问题可以通过convex程序\ [\ min _ {\ theta \ in r^d} \ sum_ {i = 1}^{n} f_ {i}(\ theta),\]每个$ f_i $都是一个凸,Lipschitz函数在$ \ theta $的$ d_i $坐标的子集中支持。以随机梯度下降为例,解决此问题的一种常见方法涉及在每次迭代时对一个$ f_i $术语进行采样以取得进展。这种方法至关重要地依赖于$ f_i $的均匀性概念,该概念正式通过其状况编号捕获。在这项工作中,我们给出了一种将上述凸公式最小化为$ \ epsilon $ -Accuracy in $ \ widetilde {o}(\ sum_ {i = 1}^n d_i \ log(1 /\ epsilon)$计算,没有关于条件号的假设。以前的最佳算法独立于条件编号是标准切割平面方法,它需要$ o(nd \ log(1/\ epsilon))$渐变计算。作为推论,我们改善了Axiotis等人的评估甲骨文的复杂性,可分解性下的最小化。 (ICML 2021)。我们的主要技术贡献是一种自适应程序,可以通过切割平面和内点方法的新型组合在每次迭代中选择$ f_i $项。
translated by 谷歌翻译
动态面部表达识别(FER)数据库为情感计算和应用提供了重要的数据支持。但是,大多数FER数据库都用几个基本的相互排斥性类别注释,并且仅包含一种模式,例如视频。单调的标签和模式无法准确模仿人类的情绪并实现现实世界中的应用。在本文中,我们提出了MAFW,这是一个大型多模式复合情感数据库,野外有10,045个视频Audio剪辑。每个剪辑都有一个复合的情感类别和几个句子,这些句子描述了剪辑中受试者的情感行为。对于复合情绪注释,每个剪辑都被归类为11种广泛使用的情绪中的一个或多个,即愤怒,厌恶,恐惧,幸福,中立,悲伤,惊喜,蔑视,焦虑,焦虑,无助和失望。为了确保标签的高质量,我们通过预期最大化(EM)算法来滤除不可靠的注释,然后获得11个单标签情绪类别和32个多标签情绪类别。据我们所知,MAFW是第一个带有复合情感注释和与情感相关的字幕的野外多模式数据库。此外,我们还提出了一种新型的基于变压器的表达片段特征学习方法,以识别利用不同情绪和方式之间表达变化关系的复合情绪。在MAFW数据库上进行的广泛实验显示了所提出方法的优势,而不是其他最先进的方法对单型和多模式FER的优势。我们的MAFW数据库可从https://mafw-database.github.io/mafw公开获得。
translated by 谷歌翻译
联合学习(FL)可以培训全球模型,而无需共享存储在多个设备上的分散的原始数据以保护数据隐私。由于设备的能力多样化,FL框架难以解决Straggler效应和过时模型的问题。此外,数据异质性在FL训练过程中会导致全球模型的严重准确性降解。为了解决上述问题,我们提出了一个层次同步FL框架,即Fedhisyn。 Fedhisyn首先根据其计算能力将所有可​​用的设备簇分为少数类别。经过一定的本地培训间隔后,将不同类别培训的模型同时上传到中央服务器。在单个类别中,设备根据环形拓扑会相互传达局部更新的模型权重。随着环形拓扑中训练的效率更喜欢具有均匀资源的设备,基于计算能力的分类减轻了Straggler效应的影响。此外,多个类别的同步更新与单个类别中的设备通信的组合有助于解决数据异质性问题,同时达到高精度。我们评估了基于MNIST,EMNIST,CIFAR10和CIFAR100数据集的提议框架以及设备的不同异质设置。实验结果表明,在训练准确性和效率方面,Fedhisyn的表现优于六种基线方法,例如FedAvg,脚手架和Fedat。
translated by 谷歌翻译
虽然单图像超分辨率(SISR)方法在单次降级方面取得了巨大成功,但它们仍然在实际情况下具有多重降低效果的性能下降。最近,已经探索了一些盲人和非盲模范,已经探讨了多重降级。然而,这些方法通常在训练和测试数据之间的分布换档方面显着降低。为此,我们第一次提出了一个条件元网络框架(命名CMDSR),这有助于SR框架了解如何适应输入分布的变化。我们使用所提出的ConditionNet在任务级别提取劣化,该条件将用于调整基本SR网络(BaseNet)的参数。具体而言,我们的框架的ConditionNet首先从支撑集中学习劣化,该支持集由来自相同任务的一系列劣化图像补丁组成。然后,Adaptive BaseNet根据条件特征迅速移动其参数。此外,为了更好地提取劣化,我们提出了一个任务对比损失,以减少内部任务距离,并增加任务级别功能之间的交叉任务距离。在没有预定义的降级地图,我们的盲框可以进行一个参数更新,以产生相当大的SR结果。广泛的实验证明了CMDSR在各种盲,甚至是非盲方法上的有效性。柔性基座结构还揭示了CMDSR可以是大系列SISR模型的一般框架。
translated by 谷歌翻译
在情感计算领域的基于生理信号的情感识别,已经支付了相当大的关注。对于可靠性和用户友好的采集,电卸电子活动(EDA)在实际应用中具有很大的优势。然而,基于EDA的情感识别与数百个科目仍然缺乏有效的解决方案。在本文中,我们的工作试图融合主题的各个EDA功能和外部诱发的音乐功能。我们提出了端到端的多模式框架,1维剩余时间和通道注意网络(RTCAN-1D)。对于EDA特征,基于新型的基于凸优化的EDA(CVXEDA)方法被应用于将EDA信号分解为PAHSIC和TONC信号,以进行动态和稳定的功能。首先涉及基于EDA的情感识别的渠道时间关注机制,以改善时间和渠道明智的表示。对于音乐功能,我们将音乐信号与开源工具包opensmile处理,以获取外部特征向量。来自EDA信号和来自音乐的外部情绪基准的个体情感特征在分类层中融合。我们对三个多模式数据集(PMEMO,DEAP,AMIGOS)进行了系统的比较,适用于2级薪酬/唤醒情感识别。我们提出的RTCAN-1D优于现有的最先进的模型,这也验证了我们的工作为大规模情感认可提供了可靠和有效的解决方案。我们的代码已在https://github.com/guanghaoyin/rtcan-1发布。
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
In this paper, we develop an efficient multi-scale network to predict action classes in partial videos in an end-to-end manner. Unlike most existing methods with offline feature generation, our method directly takes frames as input and further models motion evolution on two different temporal scales.Therefore, we solve the complexity problems of the two stages of modeling and the problem of insufficient temporal and spatial information of a single scale. Our proposed End-to-End MultiScale Network (E2EMSNet) is composed of two scales which are named segment scale and observed global scale. The segment scale leverages temporal difference over consecutive frames for finer motion patterns by supplying 2D convolutions. For observed global scale, a Long Short-Term Memory (LSTM) is incorporated to capture motion features of observed frames. Our model provides a simple and efficient modeling framework with a small computational cost. Our E2EMSNet is evaluated on three challenging datasets: BIT, HMDB51, and UCF101. The extensive experiments demonstrate the effectiveness of our method for action prediction in videos.
translated by 谷歌翻译